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One strategy for winning a coevolutionary struggle is to evolve rapidly. Most of the literature 
on host-pathogen coevolution focuses on this phenomenon, and looks for consequent evidence of 
coevolutionary arms races. An alternative strategy, less often considered in the literature, is to 
deter rapid evolutionary change by the opponent. To study how this can be done, we construct 
an evolutionary game between a controller that must process information, and an adversary that 
can tamper with this information processing. In this game, a species can foil its antagonist by 
processing information in a way that is hard for the antagonist to manipulate. We show that the 
structure of the information processing system induces a fitness landscape on which the adversary 
population evolves, and that complex processing logic is required to make that landscape rugged. 
Drawing on the rich literature concerning rates of evolution on rugged landscapes, we show how 
a species can slow adaptive evolution in the adversary population. We suggest that this type of 
defensive complexity on the part of the vertebrate adaptive immune system may be an important 
element of coevolutionary dynamics between pathogens and their vertebrate hosts. 



I. INTRODUCTION 

Coevolution is often antagonistic, such that one species 
benefits from the other's loss. Classic examples include 
predators and their prey, and pathogens and their hosts. 
Antagonistic coevolution is commonly thought to result 
in rapid co-evolutionary arms races (1). When partic- 
ipants in a coevolutionary arms race can tamper with 
their opponents' control systems, as microbial pathogens 
do with host immune regulation (2, 3), we might expect 
to see a series of subversion efforts and subsequent coun- 
termeasures deployed over evolutionary time. Thus one 
might expect rapid evolutionary divergence in the sys- 
tems involved in controlling and regulating the attacks 
and defenses used in antagonistic interactions (4). 

However, antagonistic coevolution need not always 
lead to rapid evolutionary change. Mechanisms that pre- 
vent subversion can halt coevolutionary arms races; the 
field of cryptography abounds with examples of such sys- 
tems. In a prescient 1955 letter only recently declassified, 
John Nash anticipated this result (5): 

...for almost all sufficiently complex types 
of enciphering, especially where the instruc- 
tions given by different portions of the key in- 
teract complexly with each other in the deter- 
mination of their ultimate effects on the en- 
ciphering, the mean key computation length 
increases exponentially with the length of the 
key.... As ciphers become more sophisticated 
the game of cipher breaking by skilled teams, 
etc., should become a thing of the past. 

Nash was right; one important example is the RSA 



cryptosystcm (6), in which two parties' communication 
over a network cannot be decoded by adversaries, unless 
they successfully find the prime factors of a large number. 
Prime factorization has been proven to be computation- 
ally difficult (or in the parlance of computer science, has 
high time complexity) and so the system is effectively 
secure. The main insight for RSA was that a mecha- 
nism can be made secure against subversion by using in- 
tractability or complexity as a defense. In this paper we 
explore how defensive complexity strategies can be gener- 
alized to domains beyond cryptography — for example, 
immunology. 

To explore the role of defensive complexity in an- 
tagonistic coevolution, we introduce a new evolutionary 
game, the control network game. This game features two 
players, the controller and the adversary. The controller 
aims to respond appropriately to the state of the environ- 
ment. To do this, the controller deploys a control system 
intermediating between sensors that receive a cue about 
the state of the world, and effectors that take an action. 
This control system is codified as a control logic with the 
cue as input and the effector responses as outputs. The 
controller's payoff is a function of the world state and 
the actions taken. The adversary aims to interfere, and 
can do so by tampering with some of the signals in the 
control logic. 

We study what happens when this game is played in 
an evolutionary context. We pay particular attention to 
the case in which the controller must first deploy a con- 
trol logic, and the adversary then has multiple periods 
in which to learn how to manipulate it. Such a state of 
affairs could come about for many reasons. One common 
biological scenario is when learning occurs at the popula- 



tion level by the mechanism of evolution by natural selec- 
tion and evolutionary rates differ, as is the case in a verte- 
brate host deploying an immune control system against 
rapidly evolving pathogen adversaries (7, 8). This sce- 
nario provides us a well defined and formally tractable 
learning system to study, in which an asexual adversary 
population evolves on a fitness landscape and the relevant 
phcnotype space corresponds the set of possible manipu- 
lations to the control system. Questions of which control 
systems are hard to learn reduce to questions about the 
rate of evolution given the fitness landscape induced for 
the adversary by the control system. We can then use 
population genetic analysis to characterize rates of evo- 
lution for typical classes of fitness functions. We show 
that complex control networks can generate sign epista- 
sis in this fitness landscape, thereby carving fitness val- 
leys that must be crossed and reducing the rate at which 
an antagonist population evolves to subvert the network. 
Thus control systems afford defensive complexity against 
natural adversaries if they induce fitness functions for at- 
tempted manipulation that take the form of rugged adap- 
tive landscapes with long deep fitness valleys. Where 
sufficient defensive complexity is in place, antagonistic 
co-evolution can lead to long periods of structural stasis 
instead of rapid change driven by an ongoing arms race. 



II. DEFENSIVE COMPLEXITY OF SIGNALING 
NETWORKS 

To exploit a signaling system, an adversary must (1) 
construct or disrupt signals used in the system, and (2) 
do so in a way that increases its own fitness. In host- 
pathogen interactions, step 1 is often simple. For exam- 
ple, viruses readily perturb the cytokine signaling net- 
work used by the host, by altering gene expression or 
by producing cytokine mimics and antagonists (9, 10). 
The latter problem — manipulating the signals in advan- 
tageous ways — may be much harder. This is the chal- 
lenge we focus on here. To do so, we make the univer- 
sal construction assumption: the adversary can construct 
any signal, but does not know what the signals do. Mak- 
ing the universal construction assumption allows us to 
reformulate the problem of evolving to manipulate the 
host's control network as a learning problem. Defensive 
complexity then reduces to non-learnability of the control 
system by the adversary population. 



A. Control Network Games 

We consider the two-player game between controller 
and adversary illustrated in figure lA. In each instance 
of the game, a cue contains information about the state 
of the environment. The controller aims to transduce 
this cue into an appropriate response. To do so, the con- 
troller's sensory apparatus detects the cue, and produces 
internal signals that will trigger the controller's response. 
The response is determined by a control logic tuned to 



accomplish some task T. This control logic is selected 
from the set L(T) of minimal-cost control logics for this 
task, i.e., from a set of control logics that all perform 
optimally on the task T. The adversary aims to alter 
the controller's response and does so by perturbing the 
controller's internal signals. 

We explore a game in which the control logic operates 
on two signals 5*1 and 5*2. We model the control logic 
as a simple branching logic on the signals as described 
in the Methods. A fitness landscape is induced by the 
example control network as follows: the adversaries at- 
tempt tamper by up- or down-regulating the signals 5i 
or 82- Up-regulation or down-regulation of the signals 
cause changes to the response value Ra- For modeling 
purposes we only need to describe the control logic and 
its fitness consequences on the adversary as a function 
of perturbations to the signals. We can do so in a way 
that is equivalent to using a particular family of control 
logics: in the Methods we show the correspondence be- 
tween control logics from this family and control logics 
with inputs described in terms of perturbations to the 
signals. 

We outline two simple examples of control logics and 
the fitness landscapes that they induce. Let the fitness 
of the adversary be I when there is no perturbation to 
the control logic. First consider a simple signaling net- 
work that has the control logic Lg and induces a fitness 
landscape for the adversary as given by Figure lb. This 
fitness landscape is a slope, with each step toward down- 
regulating ^1 and up-regulating 6*2 being progressively 
more beneficial for the adversary. Second, contrast this 
with a control logic Lc that is more complex in that it 
requires more logical operations per conditional IF in- 
struction. The control logic Lc generates a multi-peaked 
fitness landscape as shown in Figure lc. On this land- 
scape, any perturbation of a single signal away from the 
starting state will decrease the adversary's fitness. 



B. Rate of learning 

In the case of the simple control logic ig, an evolving 
adversary population can readily traverse the monotonic 
fitness landscape shown in Figure lb. Even when muta- 
tion supply is limiting, an adversary population can reach 
its fitness optimum by fixing two beneficial mutations in 
succession. This is a relatively fast process, since drift or 
mutation-selection balance is not necessary to maintain 
the single mutation upon which the double mutant can 
arise. We can estimate the timescalc on which double 
fixation occurs. Sequential fixation of two beneficial mu- 
tations takes approximately Ts = \/ \/ ^"^N generations 
on average (11), where N is the population size and /i is 
the mutation rate for each mutation. 

The complex fitness landscape induced by Lc and 
shown in Figure lc has five peaks. One of these (A5i 
down, AS'2 up) is a global maximum; the others, includ- 
ing the no-perturbation starting point, are local max- 
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FIG. 1 Control network game. A. The controller and adversary play the following game: the controller deploys a control 
logic and the adversary tries to subvert it. The controller has a sensor that transduces the cue into signals 5*1 and 82- The 
control logic then processes these signals to determine the response. The payoff to the controller is maximized when the signals 
are unperturbed, whereas the fitness of the adversary is some arbitrary function of the responses. Note that this game is 
thus generically non-zero-sum. The adversary cannot control either the cue or the response directly, but can manipulate the 
response by tampering with the signals 5*1 and 82- In our game, the adversary can upregulate, downregulate, or leave each 
signal unchanged. B. A simple control logic induces a single-peaked fitness landscape on which the adversary can easily evolve to 
the global optimum. C. A more complex control logic can induce a multi-peaked fitness landscape that prevents the adversary 
from rapidly evolving to the global optimum. 



ima. Between these peaks are fitness valleys that the 
adversary population must cross. Suppose that the dele- 
terious intermediate in the fitness valley suffers a fitness 
disadvantage of S relative to the no-perturbation state. 
Following Weissman et al. (12), when the population size 

1 A^ 

of the adversary is less than N = ^ log -^ and the selec- 
tive disadvantage greater than 2^fJIs^ the fitness valley 
will tend to be crossed by a process known as stochastic 
sequential fixation. In this process, the deleterious inter- 
mediate initially drifts to fixation against selection, and 
subsequently the beneficial double mutant arises and is 
fixed by selection. In that regime, the expected number 
of generations to cross the fitness valley is approximately 
of order Tc = ii~^ e^^~^^^ . When population size of the 
adversary is larger than — %■ and selective disadvantage 
is lower than s, the system lies in the so-called determin- 
istic regime. In this case, the population is large enough 
that a double mutants are created immediately and these 
go to fixation. In the deterministic regime the expected 
number of generations to cross the fitness valley is ap- 
proximately Tc' = log f f^ j ■ 

On the simple landscape induced by Ls, the time to 
fixation t^ is of order 1/\/N . On the complex landscape 
in the stochastic sequential fixation regime, the time to 



fixation Tc is exponential in N . Clearly as N gets large, 
Tc » Ts- In other words, it takes far longer for an ad- 
versary population to evolve to manipulate the complex 
control logic Lc than it does to evolve to manipulate the 
simple logic L^. Note, however, that if the adversary pop- 
ulation were somehow to become large enough so as to en- 
ter the deterministic regime, this result could in principle 
reverse. For extremely large adversary populations with 
low cost of ineffective tampering the complex logic can 
be subverted more quickly than the simple logic, since in 
the deterministic regime the expected time to fixation r^ 
is of order log(l/iV). The complex network in that case 
admits an exponential speedup in terms of learning time 
compared to the simple one. 



III. WHAT MAKES A CONTROL NETWORK 
LEARNABLE? 

We have analyzed two example networks that differ in 
their learnability. The next step is to develop a general 
theory relating the properties of a control logic to the 
learnability of that system. 

A control logic can be described as a set F of 
functions — one for each conditional in the branching 



logic. Each of these functions / in F takes some in- 
put and returns either a 1 (if the formula specifying the 
conditional evaluates to True) or (otherwise, see Meth- 
ods). In the example of section 2, the control logic would 
take the signals Si and 52 as inputs, and determine the 
appropriate response by evaluating each conditional. We 
can quantify circuit complexity as follows. The circuit 
complexity of branch / of the control logic is simply the 
minimum number of ternary logic gates (see Methods) 
needed to implement / (13). The circuit complexity of 
the full logic F is the maximum circuit complexity over 
fmF. 

We consider a control logic to be effectively unlearn- 
able by natural selection if the learning time for this logic 
is exponential in the number of signals n. In this case, 
the controller can force the learning time to blow up ex- 
ponentially by adding even a modest number of signals. 
The major result of this section is that one can construct 
an effectively unlearnable control logic with circuit com- 
plexity that is linear in the number of signals. Formally, 



Theorem 1 There exists a control logic on n signals 
with circuit complexity 0{n) is learnable in a number of 
generations exponential in n. 

We prove this theorem in the Appendix; the ba- 
sic intuition is as follows. Think of the n-dimensional 
3x3x3x...x3 hypercube where each dimension repre- 
sents perturbations (down, none, up) to one signal. We 
establish a control logic by which all corners of the hyper- 
cube are global optima and the center is a local optimum. 
All other spaces are fitness valleys. In other words, we 
construct a control logic in which global maxima occur 
only where each and every signal has been altered from 
its default value. To reach a global optimum, an adver- 
sary needs to tamper with n different signals. Then we 
show that from the starting place where signals are left 
unperturbed, the expected number of generations needed 
to produce one of these beneficial n-mutants is exponen- 
tial in n. 

This construction is just one simple example. More 
complicated fitness landscapes could lead pathogen pop- 
ulations on detours through a sequence of local maxima, 
delaying convergence to the global maximum. 



IV. DISCUSSION 

We were motivated by considering antagonistic coevo- 
lution such as that between pathogens and the adaptive 
immune system of vertebrates. Our framework suggests 
that the kinds of signaling networks present in the im- 
mune system induce a complex fitness landscape with 
valleys and local maxima for pathogens attempting to 
deceive the immune system. Our results are consistent 
with two broad observations pertaining to immunology. 
First, mice serve as a surprisingly faithful model system 
for uncovering principles of immunological signaling and 



control that are also valid in humans even though these 
two species diverged 75 million years ago. This rela- 
tive stasis is consistent with the predictions of the de- 
fensive complexity model; a sufficiently complex immune 
circuitry would limit the extent to which rapidly evolv- 
ing viral and bacterial pathogens provoke coevolutionary 
arms races than in turn would drive divergence between 
the immune systems of mice and men. Second, immunol- 
ogists have found it considerably difl&cult to decipher the 
rules behind the functioning of the immune system. De- 
fensive complexity might be expected to give involve to 
complex rules which are difficult for pathogens to exploit 
and immunologists to understand. The next step will be 
to quantify the extent to which these observations arise 
from defensive complexity rather than other factors. 



V. METHODS 

We implement the control logics for our example net- 
works using Kleene's three-valued logic; for truth tables 
in Kleene's logic, see ref. (14). We denote the perturba- 
tions to signal i with ASi. Each perturbation can take on 
values -I-, — or •. When instantiating the control logic, 
we interpret -I- as 1, — as 0, and • as an input being 
absent or unknown. For example, in the ^ operation, 
when given • as input, the output is also •. The AND 
operator will evaluate to True only if both its inputs are 
True. The OR operator will evaluate to True only if 
at least one of its inputs is True. After the branching 
program terminates — which happens when a particular 
conditional evaluates to True — then the rest of the con- 
ditionals cannot be triggered, and so the order of speci- 
fication for the conditionals matters a great deal. 

We use the notation A for the AND logic gate, we use 
V for the OR logic gate, ® for XOR, and we use ^ for 
NOT. We can implement the control logic Ls as follows: 

1. IF A^i A -A^s, ARa ^ +2m 

2. IF A^i e ^AS2, ARa = +m 

3. IF AS2 A ^A^i, ARa = -2m 

4. IF AS2 ® -AS*!, ARa = -m 

5. ELSE ARa = +0 

We can implement control logic Lc as follows: 

1. IF AS2 A -A^i, ARa = -2m 

2. IF (A5i A AS2) V (A5i A -A^a) V (-A^i A -AS^a), 
ARa = -m 

3. IF (AS*! ® ^AS'a) V (^2 ® -A5i), ARa = +2m 

4. ELSE ARa = +0 

We have thus far discussed control logics in terms of 
perturbations to signals, but this is only a way of simpli- 
fying the full control logic for the purpose of discussion. 



In fact, we can construct control logics equivalent to the 
perturbation-based control logics based on the following 
substitutions. We say that • corresponds to the default 
value di (c) that is set by the control logic to signal i based 
on the value of the cue c (more explicitly, by the sensor, 
based on the value of c), with + and — corresponding 
to any quantity larger than di(c) and smaller than di{c) 
respectively. Consider hi and li, which satisfy the in- 
equality hi > di{c) > li for all i. We can translate each 
gate over perturbations into a gate over the signals and 
the original cue. For ^ASi we have 



li, if 5"^ > di{c) 

g-,{Si,c) = { di{c), if St = di{c) 

hi, if Si < di{c) 

For AS*! A AS2 (with a ternary AND gate), we have 
<7a(5'i,52,c) = (Si > di{c)) A (52 > ^2(0)) where A 
is the ordinary boolean AND gate. For A^i V AS'2, 
we have gy{Si,S2,c) — Si + S2 > di{c) + ^2(0). For 
A^i® AS'2, wehaveg®(5i,5'2,c) == 5a(S'i, 5^(52, c),c)V 
g^(g^[Si,c), S2,c) where V is the ordinary boolean OR 
gate. 
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Appendix A: Proof of main theorem 

We envision the adversary population as evolving by 
Wright-Fisher dynamics against a controller who deploys 
a control system that generates a hypercube fitness land- 
scape as sketched in section III. To analyze the learn- 
ing process of the adversary population, we approximate 
the Wright-Fisher dynamics using a branching process 
model. The adversary population begins with a wild type 
population of individuals that do not perturb the control 
system at all. From this wild type, mutants arise that 
perturb one of the control signals up or down; because 
all single mutants are deleterious, these mutant individ- 
uals step down into the fitness valley at some rate per 
generation rate that is no larger than the product of the 
population size and the mutation rate N fj,. 

To cross the valley in k dimensions, the individual 
needs to found a lineage that accumulates at least k 
successive mutations and thus this lineage must survive 
in the fitness valley for at least k generations. Alterna- 
tively, a lineage could pick up multiple mutations in each 
generation — but this won't help. Consider the case that 
w mutants arise each time step, and the lineage must 
survive for at least k/w generations. When k grows very 
large and w is constant, asymptotic analysis tells us that 



ck 



where fi is the 



the individual needs to found a Hneage that survives for 
at least 6(fc) generations (because linifc_j.oo I ^rl ~ (1/^) 
and limfc_).oo I fcTjjl — w). In this case, generating w- 
niutants each time-step doesn't speed up crossing the fit- 
ness valley appreciably. But what w is allowed to grow 
with fc? For example, let w = ck, where < c < 1. In 
that case we would only require a lineage to survive for 
1/c steps, and this is independent of k. But the time that 
it would take to generate the w mutant required for each 

successive step is on average -^ = ( - 

mutation rate. There is no escape from an average time 

exponential in k. 

It is therefore sufficient to determine how likely an in- 
dividual who steps down into the valley is to found a 
lineage that survives in the valley for at least k genera- 
tions, as k grows very large. If this probability decreases 
exponentially with k, the average time until we get the 
first individual who is destined to succeed will increase 
exponentially in k. To model the fate of such a lineage, 
we approximate it as a subcritical branching process (15- 
17). If the relative fitness of an individual in the valley 
is < A < 1, this individual will have a Poisson num- 
ber of offspring with mean A. Each of these offspring 
will themselves have Poisson numbers of offspring, again 
with mean A. First we prove a technical lemma, and 
then we show that indeed the average time until we get 
the first individual who is destined to succeed increases 
exponentially in fc as /c grows large. 



Lemma 2 1 — e /[ 



,k] = \e-^ - llA^C^), where 



A < 1 and f[b,k] is the tetration function b^ with k 
levels of iterated exponentiation. 

Proof First we use the result (Theorem 2 in ref. (18)) 
that since A < 1, f[e^'^ ,k] converges to e^ at a linear 



Xe- 



rate A. Thus for any n, we have |/[e 
0{X\f[e^^-\n]-e^\). 

From the above, terminating with k and solving the 
recurrence relation, we obtain: 



me 



\e- 



A0W|/[, 



Ae" 



,01 



and so by the definition of tetration, |/[e 



\e- 



,k] 



\0{k)\^^ — e'^j. The symmetry property of absolute value 

implies that |e^ - f[e^'''\k]\ ^ A^W]! - e^| 
Multiply both sides by e^^ to obtain: 

|1 - e-^ f[e^'''\k]\ = A^C^^'le-^ - 1| 

since for c > 0, c\a — b\ = \c{a — b)\. Because 
fk ~ e~'^/[e^'^ , k] is the cumulative distribution func- 
tion (CDF) of some distribution (19), this implies that 

< /fc < 1. Therefore, < 1 — fk < 1, and in particular, 

1 — /fc > 0. Consequently we can simplify the absolute 
value as follows: 

l-e-\f[e^'="',fc]-AOW|e-^-l| 
And thus 1 - e-^f[e^'''\k] = je"^ - IjA^W | 



Lemma ^ As k grows large, the average time until we 
get the first individual who is destined to succeed assum- 
ing its lineage is modeled by a subcritical Poisson branch- 
ing process (with average number of offspring < A < 1 j 
is at least (1/X)^^''^^' /{N fi) generations on average. 

Proof By the subcriticality of the branching process that 
generates the lineage, A < 1. The random variable Y is 
the number of generations the lineage generated by the 
branching process survives. The CDF of the branching 
process /„ — P{Y < n) gives us the probability of a 
lineage surviving no more than n generations. Therefore, 
for our purposes, we need to characterize the probability 
of non-extinction for k generations, which means we must 
characterize P{Y > fc — 1) = 1 — fk~i- 

By equation 2.5 in Farrington and Grant (19), when 

•^ < 1, fn = e^^/[e^^ ,n], where f[b,k] is the tetra- 
tion function. Therefore, by Lemma 2, P{Y > fc — 1) = 
|e~'^ — 1|A*^('^^^^ The number of mutants who must enter 
the valley before one does so successfully is on average 
l/P{Y > fc- 1), which is thus at least (l/A)"^''"^) since 
it is easy to show that 0<|e~''' — 1|<1. Since the rate 
at which mutants are produced from the wild type each 
generation is upper bounded by A^ jjl, the result follows. 



Now to prove the main Theorem, we combine Lemma 
3 with a suitable control logic: 

Theorem 1 There exists a control logic on n signals 
with circuit complexity 0{n) is learnable in a number of 
generations exponential in n. 

Proof Without loss of generality we will assume that the 
controller response with deleterious consequences to the 
adversaries is i?i. What this means is that the fitness of 
the adversary is 1 — ARi. 

The adversary has three possible actions, upregulate, 
downregulate, or do nothing. We will now build a control 
logic that downregulates Ri only if the adversary upreg- 
ulates or downregulates each and every one of the signals 
(rather than doing nothing), and otherwise upregulates 
Ri. We will then show that for small enough adversary 
populations, generating an n-mutant takes time expo- 
nential in n. 

The control logic L we consider is simply: 

1. IF D(Axi) A ... A D{Axn), ARi = +0 

2. IF D{Axi) V ... V D{Ax„), ARi = 6 

3. ELSE ARi = -s 

where D{x) — (x <;=> •) and < S < 1. Here <S^ represents 
the logical equivalence operator in Klcene's logic. Gen- 
erating an optimal type requires a beneficial n-mutant 
with deleterious intermediates, and by Lemma 3, this 
takes at least on average {1/X)^^^~^^ /{N n) generations, 
for < A < I. Since A is the relative fitness of an indi- 
vidual in the fitness valley X = 1 — S hy line 1 of L. 



Now note that L on the first hne has n clauses of the result follows, 
form Ax <=> •, each of which has 1 gate, and connected 
by n— 1 OR gates. In total then there are n+n—1 = 2n— 1 
gates on the first line of L (and this is maximal). The 



